Enhancing mass spectrometry imaging accessibility using convolutional autoencoders for deriving hypoxia-associated peptides from tumors

Mass spectrometry imaging (MSI) allows to study cancer’s intratumoral heterogeneity through spatially-resolved peptides, metabolites and lipids. Yet, in biomedical research MSI is rarely used for biomarker discovery. Besides its high dimensionality and multicollinearity, mass spectrometry (MS) technologies typically output mass-to-charge ratio values but not the biochemical compounds of interest. Our framework makes particularly low-abundant signals in MSI more accessible. We utilized convolutional autoencoders to aggregate features associated with tumor hypoxia, a parameter with significant spatial heterogeneity, in cancer xenograft models. We highlight that MSI captures these low-abundant signals and that autoencoders can preserve them in their latent space. The relevance of individual hyperparameters is demonstrated through ablation experiments, and the contribution from original features to latent features is unraveled. Complementing MSI with tandem MS from the same tumor model, multiple hypoxia-associated peptide candidates were derived. Compared to random forests alone, our autoencoder approach yielded more biologically relevant insights for biomarker discovery.

that the latent features are either too unspecific, resulting in too many associations, or too abstract, resulting in only few associations.Supplementary Figure 6c shows the structural similarity index measure (SSIM) scores of both runs compared to the exemplary run of the unsupervised ConvAE approach with a patch size of 3 and a kernel size of 2. Of note, some individual runs with a kernel size of 4 were able to achieve similar or even slightly better SSIM scores than the exemplary run with a kernel size of 2 (Supplementary Figure 7).However overall, results obtained with a higher kernel size exhibited lower SSIM scores and were generally more diverse and less deterministic.
Besides the training of the autoencoder, the patch size has a greater influence on the performance on the random forest (RF) regression models.Supplementary Figure 8 shows that a higher patch size results in a greater number of pixels being considered when deriving the mean hypoxia value.As a consequence, the regression model can no longer account for the precise position of particular small hypoxic spots.This effect deteriorates the performance of the RF especially when training is carried out on fewer overlapping patches.The number of overlapping patches is defined by the step size, i.e., a step size of 1 generates the maximum amount of overlapping patches.To assess the impact, the topmost-ranked features were compared to the reference mass-to-charge (m/z) value 998.472 at a patch size of 3 and 5, with a fixed step size of either 1 or 2 in 10 RF only runs, considering all cross-validation results.This comparison takes into account that a change in the step size will influence the total number of patches for training the RF.Supplementary Figure 9 shows that among the topmost-ranked features, more outliers with a low SSIM were picked up by the RF at a patch size of 5 compared to a patch size of 3, given the step size was set to 1.At a step size of 2, the SSIM score of experiments with a patch size of 5 deteriorated further in comparison to runs with a patch size of 3.

Convolutional variational autoencoder (ConvVAE)
In a ConvVAE, the encoder is outputting the mean and variance of a latent space dimension to describe its distribution.This is different to a non-variational approach, in which the encoder outputs a direct mapping from the input.For comparison, the overall architecture and configuration of the non-variational approach was employed (see Methods for details).After training the ConvVAE, mean latent representations were extracted to train the RF regression model and subsequently ranking them for association with hypoxia.Supplementary Figure 10 shows exemplary mean latent space representations of the ConvVAE from different runs, with the top-ranked hypoxia-associated feature shown on the left hand side and one additional feature on the right hand side.When comparing the recovered m/z values to the reference m/z value 998.472, the SSIM scores of the ConvVAE were found to be significantly lower than those of the proposed ConvAE (Supplementary Figure 11).This suggests that the mean latent features of the ConvVAE do not retain sufficient hypoxia information.

Redundancy of m/z values
Supplementary Figures 12 and 13 illustrate that even when the number of m/z values is drastically reduced

(
from 18,735 to 2,642 and 775 m/z values, respectively), the SSIM scores of the ConvAE approaches are higher than those of RFs alone.Due to the difference in pre-processing the samples (see Methods), the reference m/z value changed from 998.472 (18,735 m/z values) to 998.502 (2,642 m/z values) and 999.4807 (775 m/z values)of individual samples in latent space.a Latent feature with highest associated to hypoxia annotations according to random forest feature importance.b Latent feature with moderate hypoxia association according to random forest feature importance.High hypoxia association, b moderate hypoxia association and c no hypoxia association according to random forest feature importance.similarity index measure (SSIM) of mass-to-charge (m/z) values recovered from different latent features to the reference m/z value 998.472 in exemplary unsupervised convolutional autoencoder (ConvAE) run.Latent feature #26 and #44 were found to be moderately associated with the hypoxia annotations.Latent feature #37 achieved the second highest feature importance score and latent feature #56 was ranked highest.Higher scores denote higher similarity to the reference m/z value 998.472.Boxplots follow the Tukey style (see Methods), incorporating p value cutpoints: **** < 10 −4 , *** < 0.001, ** < 0.01, * < 0.05, ns >= 0.05.Groups were compared using two-sided Mann-Whitney U rank tests, where p values were corrected to control the false discovery rate.

cSupplementary Figure 6 :Supplementary Figure 7 :Supplementary Figure 9 :
m/z id=3717, m/z value=1426.709Supplementary Figure 4: Exemplary mass-to-charge (m/z) values that were distinctively associated with hypoxia by the semi-supervised convolutional autoencoder (ConvAE) approach.Patch size x = 5, kernel size = 2 of adjusted input patch size in unsupervised convolutional autoencoder (ConvAE) runs.Apart from the patch size, all other parameters remained fixed.Shown are latent features which exhibited the highest feature importance score for hypoxia (left) and one other latent feature (right) in different autoencoder runs with a patch size x = 2, b patch size x = c, patch size x = 5.Latent representations depicted similar characteristics independent of the patch size.A higher input patch size increases the latent image size.a Patch size x = 5, kernel size = Effect of adjusted kernel and patch size in unsupervised convolutional autoencoder (ConvAE) runs.Apart from the kernel size and the patch size, all other parameters remained fixed.a, b Shown are latent features which exhibited the highest feature importance score for hypoxia (left) and one other latent feature (right) in different autoencoder runs with a patch size x = 5, kernel size = 4 and b patch size x = 7, kernel size = 6.A higher input patch size increases the latent image size, while a higher kernel size decreases it.c Structural similarity index measure (SSIM) of all identified hypoxia-associated features to the reference mass-to-charge (m/z) value 998.472 in exemplary runs with adjusted kernel sizes and exemplary unsupervised ConvAE run shown in Fig. 6.Boxplots follow the Tukey style (see Methods), incorporating p value cutpoints: **** < 10 −4 , *** < 0.001, ** < 0.01, * < 0.05, ns >= 0.05.Groups were compared using two-sided Mann-Whitney U rank tests, where p values were corrected to control the false discovery rate.a Patch size x = 5, kernel size = 4 Example in which an adjusted kernel and patch size in one unsupervised convolutional autoencoder (ConvAE) run retained hypoxia-associated features according to the structural similarity index measure (SSIM).Apart from the kernel size and the patch size, all other parameters remained fixed.a Shown are latent features which exhibited the highest feature importance score for hypoxia (left) and one other latent feature (right) in an additional autoencoder run with patch size x = 5, kernel size = 4.A higher input patch size increases the latent image size, while a higher kernel size decreases it.b SSIM of all identified hypoxia-associated features to the reference mass-to-charge (m/z) value 998.472 in exemplary run with adjusted kernel size and exemplary unsupervised ConvAE run shown in Fig. 6.Boxplots follow the Tukey style (see Methods), incorporating p value cutpoints: **** < 10 −4 , *** < 0.001, ** < 0.01, * < 0.05, ns >= 0.05.Groups were compared using two-sided Mann-Whitney U rank tests, where p values were corrected to control the false discovery rate.Effect of increased patch size size on random forest (RF) regression models trained on overlapping patches.Shown is the structural similarity index measure (SSIM) between the topmost-ranked features to the reference mass-to-charge (m/z) value 998.472 with varying patch sizes at a step size of either 1 (left-hand side of image) creating the maximum number of overlapping patches, or 2 (righthand side of image) creating a reduced set of overlapping patches, in all three samples and cross validation results of 10 RF only runs.Higher scores denote higher similarity to the reference m/z value 998.472.Boxplots follow the Tukey style (see Methods), incorporating p value cutpoints: **** < 10 −4 , *** < 0.001, ** < 0.01, * < 0.05, ns >= 0.05.Groups were compared using two-sided Mann-Whitney U rank tests, where p values were corrected to control the false discovery rate.

c
of mean latent features from different convolutional variational autoencoder (ConvVAE) runs.Shown are mean latent features which exhibited the highest feature importance score for hypoxia (left) and one other latent feature (right).analysis of 10 runs each using the proposed nonvariational unsupervised convolutional autoencoder (ConvAE) shown in Fig. 7b versus a unsupervised convolutional variational autoencoder (ConvVAE) approach.Boxplots follow the Tukey style (see Methods), incorporating p value cutpoints: **** < 10 −4 , *** < 0.001, ** < 0.01, * < 0.05, ns >= 0.05.Groups were compared using two-sided Mann-Whitney U rank tests, where p values were corrected to control the false discovery rate.a Structural similarity index measure (SSIM) of all identified hypoxiaassociated features to the reference mass-to-charge (m/z) value 998.472 per sample in 10 individual runs each.Number of hypoxia-associated m/z values that were identified by both approaches in 10 runs.analysis of 10 runs each of convolutional autoencoder (ConvAE) and random forest (RF) only approaches, with the total number of features being reduced to 2,642 m/z values.Boxplots follow the Tukey style (see Methods), incorporating p value cutpoints: **** < 10 −4 , *** < 0.001, ** < 0.01, * < 0.05, ns >= 0.05.Groups were compared using two-sided Mann-Whitney U rank tests, where p values were corrected to control the false discovery rate.a Structural similarity index measure (SSIM) of all identified hypoxia-associated features to the reference mass-to-charge (m/z) value 998.502 per sample in 10 individual runs each.c Number of hypoxia-associated m/z values that were identified by the three approaches in 10 runs.analysis of 10 runs each of convolutional autoencoder (ConvAE) and random forest (RF) only approaches, with the total number of features being reduced to 775 m/z values.Boxplots follow the Tukey style (see Methods), incorporating p value cutpoints: **** < 10 −4 , *** < 0.001, ** < 0.01, * < 0.05, ns >= 0.05.Groups were compared using two-sided Mann-Whitney U rank tests, where p values were corrected to control the false discovery rate.a Structural similarity index measure (SSIM) of all identified hypoxia-associated features to the reference mass-to-charge (m/z) value 999.4807 per sample in 10 individual runs each.c Number of hypoxia-associated m/z values that were identified by the three approaches in 10 runs.picking on multiple samples.Shown is the mean intensity of all pixels in the range of mass-to-charge (m/z) values 809.30 to 809.50 per sample.Row #1 shows the mean raw spectra per sample.Row #2 shows the peaks derived from the mean spectra of a given sample.Row #3 illustrates the final TIC-normalized peaks for all samples.a For Samples 5, 4, 1 and 2, their sample-specific peaks match best with mean m/z value 809.387.b For Sample 3, its sample specific peaks match best with mean m/z value 809.400.The two peaks around 809.387 and 809.400 are likely to denote mass shifts.As the true mass is unknown, both peaks are kept for further analysis.